# Scripts that Aggregate and Rearrange Data

This folder includes scripts that aggregate and rearrange data for later use. The sequence of scripts indicate potential dependency. 
* \* denotes particularly important scripts.
* ^- denotes scripts that one can ignore for now.

## Note on Variable Names

Variables in Data Sets:

* A variable name usually starts with the physical unit of the variable.
	* Examples: `emp_ind` = employment in the industry. `ln_emps_ind_10` = ln(employment share of top 10% industry-firms in industry).
* What follows usually characterizes 1) the subset of data with which the value is calculated; 2) whether the variable is a difference (denoted by a stand-alone `d`).
	* Examples: `ln_emps_ind_d_10_1977_2013` = Change in ln(employment share of top 10% industry-firms in industry) between 1977 and 2013.
	
Some Abbreviation Used:

* `emp` = Employment, `est` = Establishment, `pay` = Payroll, `sales` = Sales.
* `msa` = MSA, `fips` = County, `czone` = Commuting Zone, `zip` = ZIP-Codes.
* `ch_ind` = Our industry classification, `sector` = Our sector.
* `indf` = Industry-firm level, `ind` = Industry level, `cind` = City-industry level, `city`/`c` = City level, `agg` = Aggregate level.
* `d` = Difference, `r` = Ratio.

## Description of General Purpose Scripts

These scripts server some basic functions.

| Script | Description |
|:---------|:---------|
| `data_0_read_raw.sas` | Script that reads in and subsets the cleaned LBD data for use in this project. |
| `m_read.sas` | Macro that either runs `data_0_read_raw.sas` or reads in data saved by `data_1_save_all.sas`. |
| `m_read_sales.sas` | Macro that is similar to m_read but reads sales data. |
| `data_1_save_all.sas` | Saves a copy of cleaned subset of data to save time for development. |

## Description of Scripts I

These scripts are mainly developed prior to 2020-11, though there can be modifications/updates.

| Script | Description |
|:---------|:---------|
| `data_0_ind_sum_core.sas`* | Macros that aggregate data to the industry level (e.g. employment of top industry firms). |
| `data_0_cityind_sum_core.sas`* | Macros that aggregate data to the city-industry level (e.g. employment of top industry firms (defined nationally) in MSAs). |
| `data_0_jobcr.sas`* | Macros that calculate the job creation rate. |
| `data_1_agg_sum.sas`^- | Calculates statistics related to top (national) firms. |
| `data_1_ind_sum.sas`* | Calculates statistics related to top industry firms. |
| `data_2_ind_sum_add.do`* | Calculates additional statistics at the industry level. |
| `data_3_sect_mfg.do` | Categorizes MfG industries based on average plant size. |
| `data_1_cityind_sum.sas`* | Calculates statistics related to top industry firms defined nationally and locally by locality. |
| `data_1_cityind_sum_msa_czone.sas`* | Calculates statistics related to top industry firms defined nationally and locally by locality for MSA, CZONEs. |
| `data_1_cityind_sum_t1.sas` | Calculates statistics related to top 1 industry firms in locality that is also a top 10% industry firm. |
| `data_2_cityind_all_sel.do`* | Calculates additional statistics at the city-industry level (for the selected intervals). |
| `data_1_jobcr_0.sas`* | Macro that calculates job creation rates. |
| `data_1_jobcr_msa/fips/czone/msa_czone.sas`* | Calculates job creation rates by MSA, county, CZONE, or MSA1983CZONE. |
| `m_sum_by_stat_alt.sas` | Alternative way of sum by status. |
| `cw_geo_msa_czone.do` | Generates alternative geo crosswalks. (`msa_czone` is the "hybrid" MSA.)|
| `data_1_cityind_sum_msa_czone.sas`* | Calculates statistics related to top industry firms defined nationally and locally by locality. (Alternative geo variables.) |
| ` ` | |

## Description of Scripts II*

These scripts are mainly developed since 2020-11.

| Script | Description |
|:---------|:---------|
| `ec_1_lbd_merge.sas` | Merges EC and LBD data |
| `data_1_mkt_ind_sum.sas` | Total numbers of markets by industry for all and top firms. Top firms are defined based on market instead of employment. |
| `data_1_mkt_ind_bea_ip.sas` | Average numbers of markets by industry defined in BEA IP data for all and top firms. Top firms are defined based on market instead of employment. |
| `data_1_aux.sas` | Auxiliary employment of establishments identified by FK. |
| `cw_ind_aux.do` | Generates a list of industries including NAICS codes that we consider as HQ (54-55). |
| `data_1_aux_ind_sum.sas` | Auxiliary employment based on our definition (with "imputation"). (Main definitions are 2, 54, 55.) |
| `data_1_aux_ind_top_sum.sas` | Auxiliary employment based on our definition (with "imputation") of top firms (defined based on market). (Main definitions are 2, 54, 55.) |
| `data_1_sales_ind_sum.sas` | Industry concentration by sales. |
| `data_2_sales_ind_sum_add.do` | Exports years to use for sales and generates additional concentration-related variables. |
| `data_1_sales_cityind_sum.sas` | City-industry concentration by sales. |
| `data_2_sales_ind_sum_add.do` | Generates additional concentration-related variables. |
| `data_2_sales_cityind_sel.do` | Calculates additional sales statistics at the city-industry level (for the selected intervals). |
| `data_3_ind_sum_d_sel.do`* | Calculates default weight for industry-level analysis and change variables for selected variables. |
| ` ` | |

## Description of Scripts III: 

These scripts were mainly developed in 2021-01+ by AK.

| Script | Description |
|:---------|:---------|
| `data_0_aux_app_sum.sas` | Macro that calculates aux employment for a generalized industry code (mainly used for Appendix) |
| `data_0_fknaics_sum.sas` | Macro that calculates industry level stats (top defined by mkt & emp) using alt industry classification (mainly used for Appendix) |
| `data_1_app_non_acq_est_top.sas` | Total emp, est etc after excluding ACQUIRED establishments (Appendix) |
| `data_1_app_same_ch_ind.sas` | Industry moments for using original ch_ind (Appendix) |
| `data_1_aux_fknaics_sum.sas` | Aux Employment for alt industry classification (Appendix) |
| `data_1_cityind_top_mkt_sum.sas` | City-Industry stats with top defined by mkts/firm |
| `data_1_fknaics_sum.sas` | Industry level stats for alt industry classification such as fknaics, naics, sic, sector (Appendix) |
| `data_1_mkt_ind_mktsize.sas` | Average Market Size for all and top firms, top defined by est |
| `data_1_sales_cityind_top_mkt_sum.sas` | City-Industry Sales stats, with top defined by mkts/firm |
| `data_1_sales_mgrow.sas` | Calculates missing growth using sales |
| `data_1_sales_new_ind_sh.sas` | Calculates sales share of new industries |
| `data_1_top_ind_firms_agg_sum.sas` | # of industries served and within-firm HHI for all industry firms |
| `data_2_sales_cityind_top_mkt.do` | Choose appropriate start/end year for sales data for data outputted by data_1_sales_cityind_top_mkt_sum.sas|




